Search results for " Statistically Validated Networks"
showing 8 items of 8 documents
Statistically Validated Networks for assessing topic quality in LDA models
2022
Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) (Blei et al., 2003) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution overwords characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be only characterized by a set of irrelevant or unchained words, being useless for the interpretation. Although many topic-quality metrics were proposed (Newman et al., 2009; Alet…
MEASURING TOPIC COHERENCE THROUGH STATISTICALLY VALIDATED NETWORKS
2020
Topic models arise from the need of understanding and exploring large text document collections and predicting their underlying structure. Latent Dirichlet Allocation (LDA) (Blei et al., 2003) has quickly become one of the most popular text modelling techniques. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models give no guaranty on the interpretability of their outputs. The topics learned from texts may be characterized by a set of irrelevant or unchained words. Therefore, topic models require validation of the coherence of estimated topics. However, the automatic evaluation …
Statistically validated mobile communication networks: the evolution of motifs in European and Chinese data
2014
Big data open up unprecedented opportunities to investigate complex systems including the society. In particular, communication data serve as major sources for computational social sciences but they have to be cleaned and filtered as they may contain spurious information due to recording errors as well as interactions, like commercial and marketing activities, not directly related to the social network. The network constructed from communication data can only be considered as a proxy for the network of social relationships. Here we apply a systematic method, based on multiple hypothesis testing, to statistically validate the links and then construct the corresponding Bonferroni network, gen…
STRANIERI, MERIDIONALI O PROVINCIALI? I CONSUMI NEL TEMPO LIBERO DELLE SECONDE GENERAZIONI
2022
In this paper, we analyze consumption patterns of leisure time among young people belonging to the so-called “second generation” of immigrants in Italy. Leisure time consumption describes how young immigrants use cultural products and services. We analyze data collected by the ISTAT through the survey on the “second generations” (2015). A comparison of leisure consumption patterns between second-generation immigrants and their Italian peers does not show significant differences. Rather, differences in consumption styles are associated to gender (male/female), geographic area of residence (North/South), and size of the municipality (large municipality/small municipality) of residence.
Households and their Expenditures as an Evolving Complex Social System
2020
Dynamics of fintech terms in news and blogs and specialization of companies of the fintech industry
2020
We perform a large scale analysis of a list of fintech terms in (i) news and blogs in English language and (ii) professional descriptions of companies operating in many countries. The occurrence and co-occurrence of fintech terms and locutions shows a progressive evolution of the list of fintech terms in a compact and coherent set of terms used worldwide to describe fintech business activities. By using methods of complex networks that are specifically designed to deal with heterogeneous systems, our analysis of a large set of professional descriptions of companies shows that companies having fintech terms in their description present over-expressions of specific attributes of country, muni…
Statistically Validated Networks for evaluating coherence in topic models
2022
Probabilistic topic models have become one of the most widespread machine learning technique for textual analysis purpose. In this framework, Latent Dirichlet Allocation (LDA) gained more and more popularity as a text modelling technique. The idea is that documents are represented as random mixtures over latent topics, where a distribution over words characterizes each topic. Unfortunately, topic models do not guarantee the interpretability of their outputs. The topics learned from the model may be characterized by a set of irrelevant or unchained words, being useless for the interpretation. In the framework of topic quality evaluation, the pairwise semantic cohesion among the top-N most pr…
A primer on statistically validated networks
2019
In this contribution we discuss some approaches of network analysis providing information about single links or single nodes with respect to a null hypothesis taking into account the heterogeneity of the system empirically observed. With this approach, a selection of nodes and links is feasible when the null hypothesis is statistically rejected. We focus our discussion on approaches using i) the so-called disparity filter and ii) statistically validated network in bipartite networks. For both methods we discuss the importance of using multiple hypothesis test correction. Specific applications of statistically validated networks are discussed. We also discuss how statistically validated netw…